Let's wrap up this Deep Learning by taking a a quick look at the effectiveness of Neural Nets!
We'll use the Bank Authentication Data Set from the UCI repository.
The data consists of 5 columns:
Where class indicates whether or not a Bank Note was authentic.
This sort of task is perfectly suited for Neural Networks and Deep Learning! Just follow the instructions below to get started!
In [1]:
import pandas as pd
In [3]:
data = pd.read_csv('bank_note_data.csv')
Check the head of the Data
In [61]:
data.head()
Out[61]:
In [67]:
import seaborn as sns
%matplotlib inline
Create a Countplot of the Classes (Authentic 1 vs Fake 0)
In [68]:
sns.countplot(x='Class',data=data)
Out[68]:
Create a PairPlot of the Data with Seaborn, set Hue to Class
In [69]:
sns.pairplot(data,hue='Class')
Out[69]:
In [71]:
from sklearn.preprocessing import StandardScaler
Create a StandardScaler() object called scaler.
In [72]:
scaler = StandardScaler()
Fit scaler to the features.
In [73]:
scaler.fit(data.drop('Class',axis=1))
Out[73]:
Use the .transform() method to transform the features to a scaled version.
In [74]:
scaled_features = scaler.fit_transform(data.drop('Class',axis=1))
Convert the scaled features to a dataframe and check the head of this dataframe to make sure the scaling worked.
In [77]:
df_feat = pd.DataFrame(scaled_features,columns=data.columns[:-1])
df_feat.head()
Out[77]:
In [79]:
X = df_feat
In [80]:
y = data['Class']
Use the .as_matrix() method on X and Y and reset them equal to this result. We need to do this in order for TensorFlow to accept the data in Numpy array form instead of a pandas series.
In [81]:
X = X.as_matrix()
y = y.as_matrix()
Use SciKit Learn to create training and testing sets of the data as we've done in previous lectures:
In [45]:
from sklearn.cross_validation import train_test_split
In [46]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
In [82]:
import tensorflow.contrib.learn.python.learn as learn
Create an object called classifier which is a DNNClassifier from learn. Set it to have 2 classes and a [10,20,10] hidden unit layer structure:
In [83]:
classifier = learn.DNNClassifier(hidden_units=[10, 20, 10], n_classes=2)
Now fit classifier to the training data. Use steps=200 with a batch_size of 20. You can play around with these values if you want!
Note: Ignore any warnings you get, they won't effect your output
In [94]:
classifier.fit(X_train, y_train, steps=200, batch_size=20)
Out[94]:
In [95]:
note_predictions = classifier.predict(X_test)
Now create a classification report and a Confusion Matrix. Does anything stand out to you?
In [96]:
from sklearn.metrics import classification_report,confusion_matrix
In [97]:
print(confusion_matrix(y_test,note_predictions))
In [98]:
print(classification_report(y_test,note_predictions))
In [99]:
from sklearn.ensemble import RandomForestClassifier
In [100]:
rfc = RandomForestClassifier(n_estimators=200)
In [101]:
rfc.fit(X_train,y_train)
Out[101]:
In [102]:
rfc_preds = rfc.predict(X_test)
In [103]:
print(classification_report(y_test,rfc_preds))
In [104]:
print(confusion_matrix(y_test,rfc_preds))
It should have also done very well, but not quite as good as the DNN model. Hopefully you have seen the power of DNN!